The MATE workbench - An annotation tool for XML coded speech corpora

نویسندگان

  • David McKelvie
  • Amy Isard
  • Andreas Mengel
  • Morten Baun Møller
  • Michael Grosse
  • Marion Klein
چکیده

This paper describes the design and implementation of the MATE workbench, a program which provides support for the annotation of speech and text. It provides facilities for flexible display and editing of such annotations, and complex querying of a resulting corpus. The workbench offers a more flexible approach than most existing annotation tools, which were often designed with a specific annotation scheme in mind. Any annotation scheme can be used with the MATE workbench, provided it is coded using XML markup linked to the speech signal. The workbench uses a transformation language to define specialised editors optimised for particular annotation tasks, with suitable display formats and allowable editing operations tailored to the task. The workbench is written in Java, which means that it is platform-independent. This paper outlines the design of the workbench software and compares it with other annotation programs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The MATE Workbench Annotation Tool, a Technical Description

The MATE workbench is a tool which aims to simplify the tasks of annotating, displaying and querying speech or text corpora. It is designed to help humans create language resources, and to make it easier for different groups to use one another’s data, by providing one tool which can be used with many different annotation schemes. Any annotation scheme which can be converted to XML can be used w...

متن کامل

The Mate Workbench - a tool for annotating XML corpora

This paper describes the design and implementation of the MATE workbench, a program which provides support for flexible display and editing of XML annotations, and complex querying of a set of linked files. The workbench was designed to support the annotation of XML coded linguistic corpora, but it could be used to annotate any kind of data, as it is not dependent on any particular annotation s...

متن کامل

Query Language for Access to Speech Corpora

With more and more speech corpora at hand the unit selection technique is a promising approach in concatenative speech synthesis. What is missing are models of optimal parameters that sufficiently describe utterances to be produced and their corresponding counterparts in collections of speech data. Prior to this, existing corpora have to be annotated on possibly relevant linguistic and signal l...

متن کامل

The MATE workbench - a tool in support of spoken dialogue annotation and information extraction

The increasing variety and sophistication of spoken language dialogue systems (SLDSs) emphasises the need for tools in support of their development and evaluation as well as for appropriate evaluation criteria. In this paper we describe how the MATE workbench can be used during SLDSs development to efficiently produce corpus-based information on SLDSs and their components. The information retri...

متن کامل

A Framework for Multilevel linguistic Annotations

This article presents a 3-step model for multilayer annotations of corpora. Each kind of annotation for a textual corporacorresponds to a di erent view on the same document. This principle can be expressed rst with a general relational model dedicated to the organisation of LR. This abstract model is then implemented as an application of the XML formalism for the encoding of large corpora. The ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Speech Communication

دوره 33  شماره 

صفحات  -

تاریخ انتشار 2001